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Abstract 

Haskell’s deriving mechanism supports the automatic generation 
of instances for a number of functions. The Haskell 98 Report 
only specifies how to generate instances for the Eq, Ord, Enum, 
Bounded, Show, and Read classes. The description of how to gen¬ 
erate instances is largely informal. The generation of instances 
imposes restrictions on the shape of datatypes, depending on the 
particular class to derive. As a consequence, the portability of in¬ 
stances across different compilers is not guaranteed. 

We propose a new approach to Haskell’s deriving mechanism, 
which allows users to specify how to derive arbitrary class in¬ 
stances using standard datatype-generic programming techniques. 
Generic functions, including the methods from six standard Haskell 
98 derivable classes, can be specified entirely within Haskell 98 
plus multi-parameter type classes, making them lightweight and 
portable. We can also express Functor, Typeable, and many other 
derivable classes with our technique. We implemented our deriving 
mechanism together with many new derivable classes in the Utrecht 
Haskell Compiler. 

Categories and Subject Descriptors D.1.1 [Programming Tech¬ 
niques]: Functional Programming 

General Terms Languages 


category theory (|Backhouse et al.||1999}, 

passing through dedi- 
, language extensions 

cated languages (Jansson and Jeuring|1997 

and pre-processors (fHinze et al.|2007| |Loh 

|2004} until the flurry 


12008} . In this evolution, expressivity has not always increased: 
many generic programming libraries of today still cannot compete 
with the Generic Haskell pre-processor, for instance. The same ap¬ 
plies to performance, as libraries tend to do little regarding code 
optimization, whereas meta-programming techniques such as Tem¬ 
plate Haskell jSheard and Peyton Jones|2002| i can generate near- 
optimal code. Instead, generic programming techniques seem to 
evolve in the direction of better availability and usability: it should 
be easy to define generic functions and it should be trivial to use 
them. Certainly some of the success of the Scrap Your Boilerplate 
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availability: it comes with the Glasgow Haskell Compiler (GHC), 
the main Haskell compiler, which can even derive the necessary 
type class instances to make everything work without clutter. 

To improve the usability of generics in Haskell, we believe 
a tighter integration with the compiler is necessary. In fact, the 
Haskell 98 standard already con tains some generic progr amming, 
in the form of derived instances ( |Peyton Jones et al.|2003| Chapter 
10). Unfortunately, the report does not formally specify how to 
derive instances, and it restricts the classes that can be derived to 
six only (Eq, Ord, Enum, Bounded, Show, and Read). GHC has 
since long extended these with Data and Typeable (the basis of 
SYB), and more recently with Functor, Foldable and Traversable. 
Due to the lack of a unifying formalism, these extensions are not 
easily mimicked in other compilers, which need to reimplement the 
instance code generation mechanism. 

To address these issues, we propose an approach to specifying 
how to derive an instance of a class, together with new behavior for 
the deriving mechanism in Haskell to automatically derive such 
a class. To allow for portability across compilers, our approach 
requires only Haskell 98 with multi-parameter type classes and 
support for a new compiler pragma. Specifically, our contributions 


• We describe a new datatype-generic programming library for 
Haskell. Although similar in many aspects to other approaches, 
our library requires almost no extensions to Haskell 98; the 
most significant requirement is support for multi-parameter 
type classes. 

• We show how this library can be used to extend the deriving 
mechanism in Haskell, and provide sample derivings, notably 
for the Functor class. 

• We provide a detailed description of how the representation for 
a datatype is generated. In particular, we can represent almost 
all Haskell 98 datatypes. 

• We provide a fully functional implementation of our library 
in the Utrecht Haskell Compiler (UHC, |Dijkstra et al.|2009} . 
Many useful generic functions are defined using generic deriv¬ 
ing in the compiler. 

We also provide a package which compiles both in UHC and GHC, 
showing in detail the code that needs to added to the compiler, the 
code that should be generated by the compiler, and the code that is 
portable between compilers^ 

The remainder of this paper is structured as follows: first we 
give a brief introduction to generic programming in Haskell ( |Sec-| 
|tion 2} , which also introduces the particular library we use. We pro¬ 
ceed to show how to define generic functions ( |Section 3~) , and then 

1 http://dreixel.net/research/code/gdmh.tar.gz 


















describe the ne cessary mo difications to the compiler for supporting 
our approach ^Section 4) . Finally, we discuss alternative designs 
dSection 5| , review related work (|Section 6^ , propose future work 
{Section l\ and conclude in |Section~8| 

2. Generic programming 

We use the generic function encode as a running example through¬ 
out this paper. This function transforms a value into a sequence of 
bits: 

data Bit = 0 \ 1 
class Encode a where 
encode:: a —> [Sit] 

We want the user to be able to write 
data Exp = Const Int \ Plus Exp Exp 
deriving (Show, Encode) 
and to use encode like 
test::(Bit] 

test = encode (Plus ( Const 1) (Const!)) 

This should be all that is necessary to use encode. The user should 
need no further knowledge of generics, and encode can be used in 
the same way as show, for instance. 

Behind the scenes, the compiler generates an instance for 
Encode Exp based on a generic specification of instances of class 
Encode. There are several ways to specify such an instance, 
both using code generation and datatype-generic approaches. We 
choose a datatype-generic approach because it is type-safe and el¬ 
egant {Hrn^ et al.|2007] >. We will discuss alternative designs and 
motivate our choice in more detail in |Sectioii~5| For now we pro¬ 
ceed to describe our new generic programming library. Th e three 
basic ingredients for generic programming, as described by |Hinze| 
|andLoh|{2009l >, are: 

1. Support for overloaded functions 

2. A run-time type representation 

3. A generic view on data 

Since we use Haskell, (1) is easy: an overloaded (ad-hoc polymor¬ 
phic) function is a method of a type class. For (2), we introduce a 
type representation similar to the one used in the regular (Van 
Noort et al. |2008b and insta nt-generics ( |Chakravarty et ah] 
[2009fr libraries, in |Section 2.1| For (3), we again use type classes 
to encode em bedding-projection pairs for user-defined datatypes in 
ISection 231 

2.1 A run-time type representation 

The choice of a run-time type representation affects not only the 
compiler writer but also the expressiveness of the whole approach. 
A simple representation is easier to derive, but might not allow 
the definition of some generic functions. More complex representa¬ 
tions are more expressive, but require more work for the automatic 
derivation of instances. 

We present a set of representation types that tries to balance 
these factors. We use the common sum-of-products representation 
without explicit fixpoints but with explicit abstraction over a sin¬ 
gle parameter. Therefore, representable types are functors, and we 
can compose types. Additionally, we provide useful types for en¬ 
coding meta-information (such as constructor names) and tagging 
arguments to constructors. We show examples of how these repre¬ 
sentation types are used in |Section 2.4| 

The basic ingredients of the sum-of-products representation 


data Ui p = Ui 

data (+) (j> yp =Lj {unLj :: 0 p}\Ri {unRj:: y p} 
data (x)<j>yp = <j>pxyp 

We encode lifted sums with (+) and lifted products with (x). 
Nullary products are encoded with lifted unit (C/ 7)^] 

The type variable p is present in all representation types: it 
represents the parameter over which we abstract. We use an explicit 
combinator to mark the occurrence of this parameter: 
newtype Pari p = Par/ {unParj ::p} 

As our representation is functorial, we can encode composition. 
Although we cannot express this in the kind system, we require the 
first argument of composition to be a representable type construc¬ 
tor. The second argument can only be the parameter, a recursive 
occurrence of a functorial datatype, or again a composition. We use 
Reci to represent recursion, and (o) for composition: 
newtype Recj <j> p = Rec / {unRec/:: <j) p} 
newtype (o) <j> y p = Compi (0 (y p)) 

PolyP ( |Jansson and Jeuring| 1997) treats composition in a similar 

Finally, we have two types for representing meta-information 
and tagging: 

newtype K / 1 y p = Kj { unK / ::y} 
newtype M/ 1 y </> p = Mj { unMj :: 0 p } 

We use Kj for tagging and M] for storing meta-information. The 
role of the 1 parameter in these types is made explicit by the 
following type synonyms: 

data D type D\ =Mj D 

data C type C/ = Mj C 

dataS type S j =MiS 

data R type Rccq = Kj R 

data P type Paro = Kj P 

We use Reco to tag occurrences of (possibly recursive) types of 
kind * and Paro to mark additional parameters of kind * (other than 
p). For meta-information, we use Dj for datatype information, Cj 
for constructor information and S1 for record selector information. 
We group five combinators into two because in many generic func¬ 
tions the behavior is independent of the meta-information or tags. 
In this way, fewer trivial cases have to be given. We present the 
meta-information associated with Mj in detail in the next section. 

Note that we abstract over a single parameter p of kind *. This 
means we will be able to express generic functions such as 
fmap:: (a— ► /}) —> 0 a —> 0 J3 
but not 

bimap:-, (a — > y) — > (J3 — > 8) — > 0 a /3 — > 0 y 5 
For bimap we need another type representation that can distinguish 
between the parameters. All representation types need to carry one 
additional type argument. However, we think that, in practice, few 
generic functions require abstraction over more than a single type 
parameter. 

2.2 Meta-information 

For some generic functions we need information about datatypes, 
constructors, and records. This information is stored in the type 
representation: 

2 We also have lifted void (V;) to represent nullary sums, but for simplicity 
we omit it from this discussion and from the generic functions in |Section~3| 

























class Datatype y where 
datatypeName:: y —> String 
moduleName :: y —> String 

class Selector 7 where 
selName:: y— > String 
selName = const "" 

class Constructor y where 
conName:: 7 — > String 
conFixity :: 7 — > Fixity 
conFixity = const Prefix 
conlsRecord:: y— > Bool 
conlsRecord = const False 

Names are unqualified. We provide the datatype name together with 
the module name. This is the only meta-information we store for a 
datatype, although it could be easily extended to add the kind, for 
example. We only store the name of a selector. For a constructor, we 
also store its fixity and mark if it has fields. This last information is 
not strictly necessary, as it can be inferred by looking for non-empty 
selName s, but it simplifies some generic function definitions. The 
datatypes Fixity and Associativity are unsurprising: 

data Fixity = Prefix \ Infix Associativity Int 
data Associativity = LeftAssociative \ RightAssociative 
| NotAssociative 

We provide default definitions for conFixity and conlsRecord to 
simplify instantiation for prefix constructors that do not use record 
notation^ 

Finally, we tie the meta-information to the representation: 

instance ( Datatype 7) => Datatype (M / D y<j> p ) where 
datatypeName = datatypeName o unMeta 
moduleName = moduleName o unMeta 
instance ( Constructor 7) => Constructor (Mj C y<j> p ) where 
conName = conName o unMeta 
instance ( Selector y) => Selector (Mi S y (j) p ) where 
selName = selName o unMeta 
unMeta -.'.Mi 1 7 0 p — > y 
unMeta = _L 

Function unMeta operates at the type-level onl y, so it does n ot need 
an implementation. We provide more details in |Section 4.5| and the 
examples later in |Section 2,4| and [Section 3!6| also clarify how we 
use these classes. 

Note that we could encode the meta information as an extra 
argument to My : 

data My 1 7 </> p = M] Meta (cj) p) 
data Meta = Meta String Fixity ... 

However, with this encoding we have trouble writing generic pro¬ 
ducers, since when we are producing an My we have to produce 
a Meta for which we have no information. With the above repre¬ 
sentation we avoid this pro blem by usin g type-classes to fill in the 
right information for us. See |Section 3.5| for an example of how this 


3 We also provide an empty default selName because all constructor argu¬ 
ments will be wrapped in an Sy, independently of using record notation or 
not. We omit this in the exam ple represen tations of this section for space 
reasons, but it becomes clear in |Section~4| 


2.3 A generic view on data 

We obtain a generic view on data by defining an embedding- 
projection pair between a datatype and its type representation. We 
use the following classes for this purpose: 
class Representableo ct T where 
from 0 :: a —> 
to 0 

class Representable: 0 T where 
from 1 :: (/) p — > T p 
to y ::Tp-*0p 

We use 1 to encode the representation of a standard type. Since T is 
built from representation types, it is functorial. In Representable y , 
we encode types of kind * —> *, so we have the parameter p. In 
Representableo there is no parameter, so we invent a variable % 
which is never used. 

All types need to have an instance of Representableo . Types of 
kind * —* * also need an instance of Representable]. This sepa¬ 
ration is necessary because some generic functions (like fmap or 
traverse ) require explicit abstraction from a single type parame¬ 
ter, whereas others (like show or enum) do not. Given the different 
kinds involved, it is unavoidable to have two type classes for this 
representation. Note, however, that we have a single set of repre¬ 
sentation types (apart from the duplication for tagging recursion 
and parameters). 

Avoiding extensions Since we want to avoid using advanced 
Haskell extensions such as type familie s ( |Schrijvers et al.||2008) 
or functional dependencies ( |Jones||2000| i, we use a simple multi¬ 
parameter type class for embedding-projection pairs. In fact, T is 
uniquely determined by a (and <j>). We could encode the represen¬ 
tation type more naturally with a type family: 
class Representableo ot where 
typ t Repo aw* 
from 0 ■■■■ a -> Rep 0 a % 
too ..Repo a x —> « 

Since type families and functional dependencies are not yet part 
of any Haskell standard, we do not use them. Instead, we use 
multi-parameter type classes, and solve the ambiguities that arise 
by coercing with asTypeOf. 

2.4 Example representations 

We now show how to represent some standard datatypes. Note 
that all the code in this section is automatically generated by the 
compiler, as described in |Section~4| 

Representing Exp. The meta-information for datatype Exp looks 
as follows: 
data %Exp 
data $Const Exp 
data $Plus Exp 

instance Datatype $Exp where 
moduleName _= "ModuleName" 
datatypeName _ = "Exp" 

instance Constructor $Const Exp where conName _ = "Const" 
instance Constructor $Plus Exp where conName _ = "Plus" 
In moduleName, "ModuleName" is the name of the module where 
Exp lives. The particular datatypes we use for representing the 
meta-information at the type-level are not needed for defining 
generic functions, so they are not visible to the user. In this pa¬ 
per, we prefix them with a $. 

The type representation ties the meta-information to the sum- 
of-products representation of Exp: 

















type Re P Q Xp = 

Di %Exp ( Ci %ConstExp ( Reco Int) 

+ C] %Plus Exp (Rec 0 Exp x Rec 0 Exp)) 

Note that the representation is shallow: at the recursive occurrences 
we use Exp, and not Rep h Q P . 

The embedding-projection pair implements the isomorphism 
between Exp and Rep^ xp \ 

instance Representableo Exp Rep^ xp where 
from 0 (Const n) = M, (L, (M; (K, n))) 
fromo ( Plus e e') =M 1 (R 1 (M, (K, exK, e 1 ))) 
too (Mi (Li (Mi (Ki «)))) = Const n 

to 0 (Mi (Ri (Mi (K, exKi e')))) = Plus e e' 

Here it is clear that fromo and too are inverses: the pattern of fromo 
is the same as the expression in too, and vice-versa. 

Representing lists. The representation for a type of kind * —* * 
requires an instance for both Representablei and Representableo- 
For lists 

data List p =Nil \ Cons p (List p) deriving (Show,Encode) 
we generate the following code: 

type RepQ ist p = 

Dj $List ( C] $M lust U] 

+ Ci $Consust (Paro p x Reco (List p))) 

instance Representableo (List p) (Rep^ st p) where 
fromo Nil = M] (Lj (Mi Ui)) 

fromo ( Cons ht}^Mi (R] (M; (K, hxK, t))) 
to 0 (Mi(L 1 (M 1 Ui))) =Nil 

to 0 {Mi (Ri (Mj (Kj hxK] t )))) = Cons h t 
We omit the definitions for the meta-information, which are similar 
to the previous example. We use Paro to tag the parameter p, as we 
view lists as a kind * datatype for Representableo . This is different 
in the Representablei instance: 

type Rep\ JSt = D/ $List (Ci$Nil List U/ 

+ Ci SConsiisi (Pari x Reci List)) 

instance Representablei ListRep\ st where 
from] Nil = Mi (Lj (Mi Ui)) 
fromi (Cons M] (Ri (Mi (Pari h x Rec / f))) 
toi (Mj (L, (Mj Ui))) =Nil 

toi (Mi (R] (Mi (Pari h x Reci t)))) = Cons h t 

We treat parameters and recursion differently in Rep , () ls ' and Rep\ lst . 
In Rep^ ,sl we use Paro and Reco for mere tagging; in Rep^' st we use 
Pari and Reci instead, which store the parameter and the recursive 
occurrence of a type constructor, respectively. We will see later 
when defining generic functions ( |Section 3) how these are used. 

Representing type composition. We now present a larger exam¬ 
ple, involving more complex datatypes, to show the expressiveness 
of our approach. Datatype Expr represents abstract syntax trees of 
a small language: 

infixr 6 * 

data Expr p = Const Int 

| Expr p * Expr p 
I VatExpr { unVar:: Var p } 

| Let [ Declp] (Expr p) 

data Decl p = Decl ( Var p) (Expr p) 


data Var p = Var p \ Var E (Var [p]) 

Note that Expr makes use of an infix constructor (*), has a selector 
(unVar), and uses lists in Let. Datatype Var is nested, since in the 
Var E constructor Var is called with [p]. These oddities are present 
only for illustrating how our approach represents them. We show 
only the essentials of the encoding of this set of mutually recursive 
datatypes, starting with the meta-information: 
data STimesExpr 
data %Var E xpr Expr 
data $UnVar 

instance Constructor %Times Ex pr where 
conName _«■***" 
conFixity _ = Infix RightAssociative 6 

instance Constructor %Var ExprExpr where 
conName _="Var_Expr" 
conlsRecord _ = True 

instance Selector %UnVar where selName . .= "unVar" 

We have to store the fixity of the * constructor, and also the fact that 
VarE X pr has a record. We store its name in the instance for Selector, 
and tie the meta-information to the representation: 

type Rep Expr = D] SExpr 

( ( Cl $ConstExpr ( Reco Int) 

+ Ci %Times E xpr (Reci Expr x Reci Expr)) 

+ ( Ci $Var ExprExpr (Si $ UnVar (Rea Var)) 

+ Ci %Let Expr (([] o Rea Decl) x Rea Expr))) 

In Rep h ( xpl we see the use of S / . Also interesting is the represen¬ 
tation of the Let constructor: the list datatype is applied not to the 
parameter p but to Decl p, so we use composition to denote this. 
Note also that we are using a balanced encoding for the sums (and 
also for the products). This improves the performance of the type- 
checker, and makes generic encoding more space-efficient, for in- 

We omit the representation for Decl. For Var we use composi¬ 
tion again: 

type Rep\ ar = D/ $ Var 

( Ci %Var Var Pari 

+ Ci%Var LVar (VaroReci [])) 

In the Var L constructor, Var is applied to [p]. We represent this as 
a composition with /fee; [ ]. 

When we use composition, the embedding-projection pairs be¬ 
come slightly more complicated: 

instance Representablei Expr Repf xpr where 
from] (Const i) =M, (L; (L; (M; (K 2 /)))) 
fromi (ei * e 2 ) = Mi (L; (/f; (M; (/fee; e; x /fee; e 2 )))) 

fromi (Var Ex pr v)=M, (R, (Lj (M, (M; (Rec, v))))) 
fromi (Let de) — 

M, (R, (R, (Mi (Comp, (fmapRec ; d) x Reci e)))) 
to; (Ml (L, (Li (Ml (Ki /))))) = Const i 

toi (M, (L, (R, (M, (Rec, e; x Rec ; e 2 ))))) = e; * e 2 

to; (M; (R] (Li (Ml (Ml (Reci v)))))) = Var Expr v 

to; (M; (R, (R, (Ml (Comp, d x Rec ; «))))) = 

Let (fmap unRec ; d) e 

We need to use. fmap to apply the Rec ; constructor inside the fists. 
In this case we could use map instead, but in general we require the 
first argument to o to have a Functor instance so we can use fmap. 
In to, we need to convert back, this time mapping unRec,. 




For Var, the embedding-projection pair is similar: 
instance Representable j Var Rep\ ar where 
from 7 ( Varx ) =M 7 (L; (Mj {Par! x))) 

from! ( Var L xs ) = M/ (R; (Af/ {Compi (fmapReci xs)))) 
toi{Mi{Li{Mi{Parix , )))) = Varx 

toi {Mi (/?/ (My (Comp! x?)))) = Vhtz, (fmap unReci xs) 
Note that composition is used both in the representation for the 
first argument of constructor Let (of type [Decl p ]) and in the nested 
recursion of Var% (of type Var [p]). In both cases, we have a recur¬ 
sive occurrence of a parametrized datatype where the parameter is 
not just the variable p. Recall our definition of composition: 

data (o) <j) YP = Compi (</> (t/rp)) 

The type 0 is applied not to p, but to the result of applying t/r to p. 
This is why we use o when the recursive argument to a datatype is 
not p, like in [Decl p] and Var [p]. When it is p, we can simply 
use Reci . 

We have seen how to represent many features of Haskell 
datatypes in our appr oach. We giv e a detailed discussion of the 
supported datatypes in |Section 7.1 1 

3. Generic functions 

In this section we show how to define type classes with derivable 
functions. 

3.1 Generic function definition 

Function encode is a method of a type-class: 
data Bit = 0 \ 1 

class Encode a where 
encode:: a —* [Bit] 

We cannot provide instances of Encode for our representation 
types, as those have kind * — > *, and Encode expects a parameter of 
kind *. We therefore define a helper class, this time parametrized 
over a variable of kind * —> *: 
class Encode! 0 where 
encode! 0 X —> [Bit] 

For constructors without arguments we return the empty list, as 
there is nothing to encode. Meta-information is discarded: 
instance Encode! Ui where 
encode! - = [] 

instance ( Encode 1 (/)) => Encode! (Mi 1 Y (? ) where 
encodei {Mi a) = encode 1 a 

For a value of a sum type we produce a single bit to record the 
choice. For products we concatenate the encoding of each element: 
instance {Encodei <l>,Encodei W) Encodei (0 + i//j where 
encodei {Li a) = 0 : encode/ a 
encodei (R/ a J m 1 '■ encodei a 
instance {Encodei tj>,Encodei W) => Encodei (0 x y) where 
encodei (a x b) = encodei a -H- encodei b 
It remains to encode constants. Since constant types have kind *, 
we resort to Encode : 

instance {Encode 0) => Encodei {Ki 1 0) where 
encodei {Ki a) = encode a 

Note that while the instances for the representation types are given 
for the Encodei class, only the Encode class is exported and al¬ 
lowed to be derived. This is because its type is more general, and 


because we need a two-level approach to deal with recursion: for 
the K 1 instance, we recursively call encode instead of encode 1 . Re¬ 
call our representation for Exp (simplified and with type synonyms 
expanded): 

type Repl xp =KiR Int + Ki R Exp x Ki R Exp 
Since Int and Exp appear as arguments to K], and our instance 
of Encodei for K t 1 p requires an instance of Encode 0, we need 
instances of Encode for Int and f or Exp. We deal with Int in the next 
section, and Exp in ^Section 3.3| Finally, note that we do not need 
Encodei instances for Reci, Pari or (°). These are only required 
for generic functions which make use of the Representable 1 class. 
We will see an example in |Section 3.4| 

3.2 Base types 

We have to provide the instances of Encode for the base types: 
instance Encode Int where encode = ... 
instance Encode Char where encode = ... 

Since Encode is exported, a user can also provide additional base 
type instances, or ad-hoc instances (types for which the required 
implementation is different from the derived generic behavior). 

3.3 Default definition 

We miss an instance of Encode for Exp. Instances of generic func¬ 
tions for representable types rely on the embedding-projection 
pair to convert from/to the type representation and then apply the 
generic function: 

encodei) e f au i, :: {Representableo a z,Encode 1 t) 

=>■ r % -»• a -> [Bit] 

encodei) e f au i, rep x = encodei ((fromo x) ‘asTypeOf ‘ rep) 
Function encode Default tells the compiler what to fill in for the 
instance of each of the derived types. Because we do not want 
to use functional dependencies for portability reasons, we pass 
the representation type explicitly to function encode Default- This 
function uses the representation type to coerce the result type of 
fromo with asTypeOf. This slight complication is a small price to 
pay for extended portability. 

Now we can show the instance of Encode for Exp and List: 
instance Encode Exp where 
encode = encodei) e f au i, (_L ::RepQ Xp x) 
instance {Encode p) => Encode {List p) where 
encode = encode De f auU (T ■.:Rep , ( j' s! p x) 

Both instances look similar and trivial. However, the instance for 
List requires scoped type variables to type-check. We can avoid the 
need for scoped type variables if we create an auxiliary local func¬ 
tion encode^, with the same type and behavior of encodeDefault'- 

instance {Encode p) => Encode {List p) where 
encode = encode u s t -L where 
encodeust ( Encode p) => Rep 1 ^ 1 p % — > List p — > [Bit] 
encodeust = encodeuefault 

Here, the local function encodeust encodes in its type the corre¬ 
spondence between the type List p and its representation Rep{f ! p. 
Its type signature is required, but can easily be obtained from the 
type of encodeD e f au i, by replacing the type variables a and r with 
the concrete types for this instance. 

For completeness, we give the instance for Exp in the same 
fashion: 

instance Encode Exp where 
encode = encode e xp -L where 








encode Exp ::Rep^ p x -> Exp -> [Bit] 
encodeExp = encode Default 

It might seem strange that we choose not to use Haskell’s built- 
in functionality for default definitions for class methods. Unfortu¬ 
nately we cannot use default methods, for two reasons: 

1. Since we avoid using type families and functional dependen¬ 
cies, we need to explicitly pass the representation type as an 
argument to encodeoefault- 

2. A default case would force us to move the Representableo Of t 
and Encodei T class constraints to the Encode class, possibly 
preventing ad-hoc instances for non-representable types and 
exposing Encode/ to the user. 

However, if the compiler is to generate instances for Exp and 
other representable datatypes automatically, how does it know 
which function to use as default? The alternative to standard 
Haskell default methods is to use a naming convention for this 
function (like appending Default to the class function name, as in 
our example). It is more reliable to use a pragma: 

{—# DERIVABLE Encode encode encodejjefault } 

This pragma takes three arguments, which represent (respectively): 

1. The class which we are defining as derivable 

2. The method of the class which is generic (and therefore needs 
a default definition) 

3. The name of the function which serves as a default definition 

Such a pragma also has the advantage of indicating derivability 
for a particular class. We could use a keyword such as derivable to 
signal that a class is allowed to be derived: 

derivable class Encode a where... 

However, by using a pragma instead (as described above) we ensure 
more portability, as compilers without support for our derivable 
type classes can still compile the code. 

Since a class can have multiple generic methods, multiple prag¬ 
mas can be used for this purpose. Note, however, that a derivable 
class can only have non-generic methods if there is a default def¬ 
inition for these, as otherwise we have no means for implement¬ 
ing the non-generic methods. Alternatively, we could treat generic 
methods as default methods, filling in the generic definition auto¬ 
matically if the user does not give a definition. This would allow 
classes to have normal, generic, and default methods. However, it 
would complicate the code generation mechanism. 

3.4 Generic map 

In this subsection we define the generic map function fmap, which 
implements the Prelude’s fmap. Function fmap requires access to 
the parameter in the representation type. As before, we export a 
single class together with an internal class where we define the 
generic instances: 

class Functor </> where 
fmap :: (p — > a) — > 0 p — > <j> a 

class Functori <j> where 
finapi:: (p -> a) -> </> p -> (/> a 

Unlike in Encode, the type arguments to Functor and Functor ; 
have the same kind, so we do not really need two classes. How¬ 
ever, for consistency, we use the same style as for kind + generic 
functions. 

We apply the argument function in the parameter case: 


instance Functori Par ; where 
finapi f (Parj a ) = Pari if a) 

Unit and constant values do not change, as there is nothing we can 
map over. We apply finapi recursively to meta-information, sums 
and products: 

instance Functorj U/ where 
finapi fU l Uj 

instance Functor] (K / if) where 
finapi f (El a) =Ki a 

instance (Functori 0) =k Functor] (Mi i y<p) where 
fmap if (Mi a) =Mi (fmap if a) 
instance ( Functor] (f>. Functor/ iff) => Functor] (<p + V ’) where 
fmap if (Li a) =Li (fmap if a) 
finapi f(Ri a) = R 7 (fmap if a) 
instance ( Functori Functor/ yr) =>• Functor/ (0 x y/j where 

finapif (a x b) =fmapif a x fmapifb 
If we find a recursive occurrence of a functorial type, we call fmap 
again, to tie the recursive knot: 

instance (Functor <p) => Functor 7 (Rec/ <j>) where 
finapi f (Reci a ) = Rec / (fmap f a) 

The remaining case is composition: 

instance (Functor <j>, Functori yr) =k Functori ((j> o yr) where 
finapi f (Compi x) = Compi (fmap (fmap if) x) 

Recall that we require the first argument of (o) to be a user-defined 
datatype, and the second to be a representation type. Therefore, we 
use finapi for the inner mapping (as it will map over a represen¬ 
tation type) but fmap for the outer mapping (as it will require an 
embedding-projection pair). This is the general structure of the in¬ 
stance of (o) for a generic function. 

Finally, we define the default method: 

{—# DERIVABLE Functor fmap fmapDefault #—} 
fmap ne j au i t :: (Representablei </> T, Functor/ t) 

=> t p — > (p — > a) — ><j) p—> (/> a 
fmap D efa U it repf x = to] (fmap if (from 7 x L asTypeOf‘ rep)) 
Now Functor can be derived for user-defined datatypes. The usual 
restrictions apply: only types with at least one type parameter and 
whose last type argument is of kind * can derive Functor. The 
compiler derives the following instance for List: 
instance Functor List where 
fmap =fmap List (_L ::Rep\ ,st p) where 

fmap us , :: Rep Plst p —> (p —> a) —> List p — * List a 
fmap us, =frnap Default 

Note that the instance Functor List also guarantees that we can use 
List as the first argument to (o), as the embedding-projection pairs 
for such compositions need to use fmap. 

The instances derived for Expr, Decl, and War are similar. 

3.5 Generic empty 

We can also easily express generic producers: functions which 
produce data. We will illustrate this with function empty, which 
produces a single value of a given type: 
class Empty a where empty:: a 

This function is perhaps the simplest generic producer, as it con¬ 
sumes no data. It relies only on the structure of the datatype to pro¬ 
duce values. Other examples of generic producers are the methods 
in Read and the Arbitrary class from QuickCheck, and binary’s 
get. As usual, we define an auxiliary type class: 


class Emptyi 0 where 
empty':: 0 % 

Most instances of Emptyi are straightforward: 
instance Emptyi U / where 

empty 1 = U/ 

instance [Empty 7 0) => Empty 7 (A-// 170 ) where 
empty 1 =Mj empty' 

instance [Empty 1 0, Empty 1 yr) =>■ Empty 1 (0 x yr) where 
empty 1 = empty' x empty 1 
instance [Empty 0) =>- Emptyi [Kj 1 0) where 
empty 1 =Kj empty 

For units we can only produce U]. Meta-information is produced 
with Mi, and since we encode the meta-information using type 
classes (instead of using extra arguments to Mi) we do not have to 
use A here. An empty product is the product of empty components, 
and for Ki we recursively call empty. The only interesting choice 
is for the sum type: 

instance [Emptyi 0) => Emptyi (0 + W) where 
empty' = Li empty' 

In a sum, we always take the leftmost constructor for the empty 
value. Since the leftmost constructor might be recursive, function 
empty might not terminate. More complex implementations can 
look ahead to spot recursion, or choose alternative constructors af¬ 
ter recursive calls, for instance. Note also the similarity between our 
Empty class and Haskell’s Bounded : if we were defining minBound 
and maxBound genetically, we could choose Li for minBound and 
Rj for maxBound. This way we would pre serve the semantics 
for der ived Bounded instances, as defined by |Peyton Jones et al7| 
j2003| >, while at the same time lifting the restrictions on types that 
can derive Bounded. Alternatively, to keep the Haskell 98 behavior, 
we could give no instance for x , as enumeration types will not have 
a product in their representations. 

The default method simply applies too to empty': 

{— # DERIVABLE Empty empty emptyDefault #—} 
empty De f au i t :: [Representableo a, T, Emptyi t) 

=t> tx —>• a 

empty De f ault rep = too ( empty' ‘asTypeOf‘ rep) 

Now the compiler can produce instances such as: 
instance Empty Exp where 
empty = empty Exp A where 
empty Exp :: Rep% xp X -> Exp 
empty Exp = empty Default 
instance [Empty p) => Empty [List p ) where 
empty = empty Lisl A where 

empty List :: [Empty p) => Repft s ' p X -*• List p 
empty List = empty Default 
Instances for other types are similar. 

3.6 Generic show 

To illustrate the use of constructor and selector labels, we define 
the shows function genetically: 
class Show a where 
shows:: a — > ShowS 
show :: a —t String 
show x = shows x "" 

We define a helper class Showi, with shows 1 as the only method. 
For each representation type there is an instance of Showi. The 


extra Bool argument will be explained later. Datatype meta-infor¬ 
mation and sums are ignored. For units we have nothing to show, 
and for constants we call shows recursively: 

class Show 1 0 where 
shows 1 :: Bool —> 0 X ~* ShowS 

instance [Showi 0) => Showi [Di y 0) where 
show si b [Mi a) = shows 1 b a 
instance [Showi <j),Showi \jf) => Showi (0 + yr) where 
shows 1 b [Li a) = showsi b a 
shows 1 b [R] a) = shows] b a 
instance Showi Ui where 
showsi _ U / = id 

instance [Show 0) =>■ Showi [Ki 1 0) where 
showsi -{Ei a) = shows a 

The most interesting instances are for the meta-information of a 
constructor and a selector. For simplicity, we always place paren¬ 
theses around a constructor and ignore infix operators. We do dis¬ 
play a labeled constructor with record notation. At the constructor 
level, we use conlsRecord to decide if we print surrounding brack¬ 
ets or not. We use the Bool argument to showsi to encode that we 
are inside a labeled field, as we will need this for the product case: 

instance [Showi 0 , Constructor 7) => Showi (A#; C 70) where 
showsi -C@[Mi a) = 

showString " (" o showString [conName c) 
o showString " " 
o wrapRecord 

[showsi [conlsRecord c) a o showString ") ") 

where 

wrapRecord:: ShowS — > ShowS 
wrapRecords \ conlsRecordc = showString "{ "os 
o showString " ]-" 
wrapRecord s \ otherwise |= s 

For a selector, we print its label (as long as it is not empty), followed 
by an "=" and the value. In the product, we use the Bool to decide 
if we print a space (unlabeled constructors) or a comma: 

instance [Showi 0, Selector 7) => Showi [Mi S 70) where 
showsi b s@ [Mi a) 

| null [selName s) = showsi b a 
| otherwise = showString [selName s) 

o showString " = " o showsi b a 
instance [Showi 0, Showi V) => Showi (0 x y/) where 
showsi b [ax c) = showsi b a 

o showString (if b then ", " else " ") 
o shows 1 b c 

Finally, we provide the default: 

{—# DERIVABLE Show shows showso e f au i t #—} 
showsoefault [Representableo a r ,Showi t) 

=> t X ->■ « -> ShowS 

showsD e f au i t rep x = showsi False [fromo x ‘asTypeOf‘ rep) 

We have shown how to use meta-information to define a generic 
show function. If we additionally account for infix constructors 
and operator precedence for avoiding unnecessary parentheses, 
we obtain a formal specification of how show behaves on every 
Haskell 98 datatype. 





4. Compiler support 

We now describe in detail the required compiler support for our 
generic deriving mechanism. 

We start by defining two predicates on types, isRepo (<j>) 
and isRepi (</>), which hold if (j) can be made an instance of 
Representableo and Representable j, respectively. The statement 
isRepo (0) holds if 0 is any of the following: 

1. A regular Haskell 98 datatype without context 

2. An empty datatype 

3. A type variable of kind * 

We also require that for every type yr that appears as an argument 
to a constructor of 0, isRepo (yr) holds. 0 cannot use existential 
quantification, type equalities or any other extensions. 

The statement isRepi (0) holds if the following conditions both 
hold: 

1. isRepo (0) 

2. 0 is of kind *—> * or A —>*—>★, for any kind A 

Note that isRepo holds for all the types of |Section 2.4| while isRepi 
holds for List, Expr, Decl, and War. 

Furthermore, we define the predicate ground (0) to deter¬ 
mine whether or not a datatype has type variables. For instance, 
ground ([Int]) holds, but ground ([a]) not. Finally, we assume the 
existence of an indexed fresh variable generator fresh p' t , which 
binds p l i to a unique fresh variable. 

For the remainder of this section, we consider a user-defined 
datatype 

data£> ai ...a n = Coni {/J:: p\, '■'■P°\ } 

| Con m {l l m ::p l m ,...,C‘::p°^} 

with n type parameters, m constructors and possibly labeled param¬ 
eter /J of type p\ at position j of constructor Con;. 

4.1 Type representation (kind • ) 

In |Figure 1| we show how we generate type representations for 
a datatype D satisfying isRepo {D). We generate a number of 
empty datatypes which we use in the meta-information: one for the 
datatype, one for each constructor and one for each argument to a 
constructor. 

The type representation is a type synonym {Rep®) with as many 
type variables as D. It is a wrapped sum of wrapped products: the 
wrapping encodes the meta-information. We wrap all arguments to 
constructors, even if the constructor is not a record. Since we use 
a balanced sum (resp. product) encoding, a generic function can 
use the meta-information to find out when the sum (resp. product) 
structure ends, which is when we reach C i (resp. S/). Each argu¬ 
ment is tagged with Parg if it is one of the type variables, or Reco 
if it is anything else (type application or a concrete datatype). 

4.2 Representableo instance 

The instance Representableo Repg is defined in |Figure 2| as in¬ 
troduced in Section [2] The patterns of the fromo function are the 
constructors of the datatype applied to fresh variables. The same 
patterns become expressions in function too■ The patterns of too 
are also the same as the expressions of fromo, and they represent 
the different values of a balanced sum of balanced products, prop¬ 
erly wrapped to account for the meta-information. Note that, for 
Representableo, the functions tuple and wrap do not behave dif¬ 
ferently depending on whether we are in fromo or too, so for these 
declarations the dir argument is not needed. Similarly, the wrap 


fu nction could have been inlined. These definitions will be refined 
in ISection 4.41 

4.3 Type representation (kind * — > *) 

See |Figure 3| for the type representation of type constructors. 
We keep the sum-of-products structure and meta-information un¬ 
changed. At the arguments, however, we can use Parg, Par;, Reco, 
Rec /, or composition. We use Pari for the type variable a, and 
Paro for other type variables of kind +. A recursive occurrence of 
a type containing a n is marked with Rec / . A recursive occurrence 
of a type with no type variables is marked with Reco, as there is 
no variable to abstract from. Finally, for a recursive occurrence of 
a type which contains something else than a,, we use composition, 
and recursively analyze the contained type. 

4.4 Representablei instance 

The definition of th e embedd ing-projection pair for kind + — * * 
datatypes, shown in |Figure 4| reflects the more complicated type 
representation. The patterns are unchanged. However, the expres¬ 
sions in toj need some additional unwrapping. This is encoded in 
var and unwC: an application to a type variable other than a„ has 
been encoded as a composition, so we need to unwrap the elements 
of the contained type. We use fmap for this purpose: since we re - 
quire isRepi (0), we know that we can use fmap (see|Section 3.4|. 
The user should always derive Functor for container types, as these 
can appear to the left of a composition. 

Unwrapping is dual to wrapping: we use Parj for the type pa¬ 
rameter otn, Reci for containers of a,,, Kj for other type parameters 
and ground types, and composition for application to types other 
than a n . Considering composition, in toi we generate only Compi 
applied to a fresh variable, as this is a pattern; the necessary un¬ 
wrapping of the contained elements is performed in the right-hand 
side expression. In from] the contained elements are tagged prop¬ 
erly: this is performed by wC H . 

4.5 Meta-information 

We generate three meta-information instances. For datatypes, we 
generate 

instance Datatype $D where 
moduleName _ = mName 
datatypeName _ = dName , 

where dName is a String with the unqualified name of datatype D 
and mName is a String with the name of the module in which D is 
defined. 

For constructors, we generate 
instance Constructor $Con,- where 
conName _ = name 

{ conFixity _ = fixity } 

{conlsRecord _ = True} , 

where i € l..m, and name is the unqualified name of constructor 
Coni. The braces around conFixity indicate that this method is 
only defined if Con; is an infix constructor. In that case, fixity 
is Infix assoc prio, where prio is an integer denoting the priority 
of Con;, and assoc is one of LeftAssociative, RightAssociative, or 
NotAssociative. These are derived from the declaration of Con; as 
an infix constructor. The braces around conlsRecord indicate that 
this method is only defined if Con; uses record notation. 

For all i € {1 ..m}, we generate 

instance Selector $£|. { where selName - = l\} ■ 
where j £ {1..0;}. The brackets indicate that the instance is only 
given a body if Con; uses record notation. Otherwise, the default 
implementation for selName is used, i.e. const " ". 








data $D 

data $Coni 

type Repjf a, . . 


If ,x.n = 0 = V; 

data $Con m 

data $L[ 

otherwise = x + IfT™- 

EGU *[»&0 = c/7 



data %l!m 

| otherwise = \Xf-\ x x II"-™ 


argpj I 3 *e{l..n} ■P'i = a k = Par n p i l 
| otherwise = Reco pj 


Figure 1. Code generation for the type representation (kind *) 


instance Representableo (D ai ...a„) (Repfi aj ... a„) where { 


fromo patj " m = ( 

:x P^ om ; 

too pat] 0 = exp'"; 

exp'" = patf om = Con,- (fresh pj)... (freshpf') 

/ro/no patm"'” = < 

apff"; 

too pat"' = exp,'"; } 

expf° m = pat'" = M 1 (inj lm (M; (tuple; (pj.. .pf)))) 


inj,„, .v m-0 mjL 
\mm$±x 

i m' = L, (inj,- m , *) 

| i >m' =R, (inj,v m _ m / x) 
where m! = |m/2j 
{ *£ 72 ] 


tuplef" (pj .. .pf) \oi = 0 =M 1 U 1 

| Oi =j =Mj (wrap 1 *'' (fresh pj)) 

| otherwise = (tuplef"' (pj .. .pf)) x (tuplef" (pf +1 .. .pf)) 
where k = |_o,- / 2J 

wrap dlr p = Kj p 


Figure 2. Code generation for the Representableo instance 


type Fepf a}... a„_/ = £>7 $£> (LfLi (C/ $Con, (nf", {£/ $£j (argpj))))) 

ar 8 p[ I 3*; 6 :pj = a k = Par 0 p i i 

\p’ i = a n = Par 1 

| pj = 0 a„ A isRepi (0) =5 Fee/ pj I”l * and n"=i * as in |Figure 1| 

| pj = 0 /3 A isRepi (0) A -1 ground (/3) = 0 o arg /3 

| otherwise = Reco pj 

Figure 3. Code generation for the type representation (kind * —+ *) 


instance Representable ; (Daj...a„_;) (Fepj J a/ ... a„_ ; ) where { 


from ; patf om = expf 0 "' 

; to; patf = expf; 

from] pat m°™ = expjjj 0 "' 

; to; pat* = expj,"; } 

patf' r , 

expj""”, inj, x, and tuplef" (pi 

wrap*'' pj | pj = otn 

= Far; (fresh pj) 


p|s0a„A isRepi (0) = Fee; (fresh pj) 
Site {i..n} = a k =Ki (fresh pj) 

pj s 0 a A -1 isRepi (0) = F; (freshpj) 


/(e 0 a A cFr =from = Compj (fmap wC a (fresh pj)) 
otherwise = Compi (fresh pj) 


exp'" = Con, (var pj)... (var pf ) 
var pj |p( = 0 aAa^a„ 

A isRepi (0) =fmap unwC« (fresh pj) 
otherwise = fresh 7 / 

i in |Figure 2 ] (but using the new wrap'*'' x). 

unwCc | a = otn = unParj 

| a = 0 a„ A isRepi (0) = unRecj 
| a = 0 P A ground (j3) = unReco 
| a = 0 P A isRepi (0) =jmap unwC^j o unCompi 
wC« | a = a„ = Far; 

| ground (a) = £"; 

I a = 0 a„ A isRepi (0) = Fee; 

|a = 0/3 A isRepi (0) = Compj o (fmapwCp) 


Figure 4. Code generation for the Representablej instance 








4.6 Default instances 

The instances of a class representing the different cases of a generic 
function on representation types present somewhat more of a chal¬ 
lenge because they refer to a specific function defined by the 
generic programmer (in our running example encodeDefauli). The 
compiler knows which function to use due to the DEFAULT pragma 
(Section 3.3) . 

After the default function has been determined, the only other 
concern is passing the explicit type representation, encoded as a 
typed _L. 

4.6.1 Generic functions on Representableo 

For each generic function/ that is a method of the type class F, and 
for every datatype D with type arguments aj ...a„ and associated 
representation type Rep® dj ... a n X, the compiler generates: 
instance (C. ..) =>■ F (D ai ... a„) where 
/ =f D - where 

f D :: ( C ...) => Repfi a 7 ... a„ x -> P 

fD =fDefault 

The type P is the type of / specialized to D, and x is a fresh 
type variable. The context C is the same in the instance head and 
in function //j. The exact context generated depends on the way 
the user specified the deriving. If deriving F was attached to the 
datatype, we generate a context F a\,. .. ,F a„, where a is the 
variable a applied to enough fresh type variables to achieve full 
saturation. This approach gives the correct behavior for Haskell 98 
derivable classes like Show. In general, however, it is not correct: 
we cannot assume that we require F a, for all / £ { 1.. n}\ generic 
children, for instance, does not require any constraints, as it is 
not a recursive function. Worse even, we might require constraints 
other than these, as a generic function can use other functions, for 
instance. 

To avoid these problems we can use the standalone deriving 
extension. If we have a standalone deriving 
deriving instance (C ...)=> F (D aj ... a„) 
we can simply use this context for the instance. In general, however, 
the compiler should be able to infer the right context by analyzing 
the context of the generic function and the structure of the datatype. 

4.6.2 Generic functions on Representablei 

For each generic function/ that is a method of the type class F, and 
for every datatype D with type arguments <Xj ...a n and associated 
representation type Rep® a/ ... a, u the compiler generates: 
instance (C. ..) =>■ F (D aj ... a„ /) where 
f=f D ±. where 

f D :: (C. ..) => Repf a 7 . .. a„ J3 

fD =fDefault 

The type j8 is the type of / specialized to D (in other words, 
/:: P). This code is almost the same as that for generic functions 
on Representableo, with a small exception for handling the last 
type variable (a n ). The context can be copied from the standalone 
deriving, if one was used, or just inferred by the compiler. 

4.7 UHC specifics 

We have a prototype implementation of our deriving mechanism 
in UHC. Although generating the required datatypes and instances 
is straightforward, we have to resolve some subtle issues. In our 
implementation, the following issues arose: 

Which stage of the compiler pipeline generates the datatypes and 
instances? Ideally, all deriving-related code is generated as early 


as possible, for example during desugaring, so later compiler stages 
can type check the generated code. However, the generation needs 
kind information of types and classes, which is only available af¬ 
ter kind checking. In UHC, the datatypes and instances are directly 
generated as intermediate Core, directed by kind information, and 
only the derived instances are intertwined with type checking and 
context reduction because of the use of the default deriving func- 

Use of fmap. The generation of embedding-projection pairs for 
types with composition requires fmap, which in turn requires the 
context reduction machinery to resolve overloading. This compli¬ 
cates the interaction with the compiler pipeline, because the gen¬ 
eration becomes not only kind-directed, but also context reduction 
proof-directed. However, all occurrences of fmap are applied to the 
identity function id, because wrappers like Parj are defined as new- 
types. In UHC, the use of context reduction is avoided assuming the 
equality /map id = id. 

Code size. Some quick measurements show a 10% increase in 
the size of the generated code. Although language pragmas like 
GenericDeriving and NoGenericDeriving could selectively switch 
this feature on or off, this would defeat the purpose of generic- 
ity. Once turned off for a datatype, no Representables are gener¬ 
ated, and no generic instances can be defined anymore. Instead, 
later transformations should prune unused code. These issues need 
further investigation. 

Bootstrapping. As soon as a user defines a datatype, code gen¬ 
eration generates the supporting datatypes. Such datatypes (e.g. 
$Co«i ) and the datatypes used by supporting datatypes (e.g. Bool, 
used in the return type of conlsRecord) are mutually dependent, 
which is detected by binding group analysis. Each binding group 
type analysis must deal with mutually dependent datatypes. This 
also means that the supporting definitions must be available in the 
first module that contains a datatype. 

Interaction with desugaring. Currently, deriving clauses are just 
syntactic sugar for standalone deriving. After desugaring, we can¬ 
not decide to generate a Representableo or a Representablei in¬ 
stance because kind information is not available. Automatically 
generating the correct context for such an instance cannot be done 
either. To work around this limitation, we only accept deriving 
clauses for generic classes that use Representableo- Derivings for 
Representablej classes have to use standalone deriving syntax, 
since then we no longer need to infer a context, and can let the 
programmer provide the required context. 

5. Alternatives 

We have described how to implement a deriving mechanism that 
can be used to specify many datatype-generic functions in Haskell. 
There are other alternatives, of varying complexity and type-safety. 

5.1 Pre-processors 

The simplest, most powerful and least type safe alternative to 
our approach is to implement deriving by pre-processing the 
source file(s), analyzing the datatypes definitions and generating 
the required inst ances with a tool such as DrIFT ( [Winstanley and| 
|Meacham|2008) . This requires no work from the compiler writer, 
but does not simplify the task of adding new derivable classes, as 
programming by generating strings is not very convenient. 

Staged meta-programming lies in between a pre-processor and 
an embedded datatype-generic representation. GHC supports Tem¬ 
plate Haskell ( |Sheard and Peyton Jones|2002] i, which has become 
a standard tool for obtaining reflection in Haskell. While Template 
Haskell provides possibly more flexibility than the purely library- 
based approach we describe, it imposes a significant hurdle on the 










compiler writer, who does not only have to implement a language 
for staged programming (if one does not yet exist for the com¬ 
piler, like in UHC), but also keep this complex component up-to- 
date with the rest of the compiler, as it evolves. As an example, 
Template Haskell support for GADTs and type families only ar¬ 
rived much later than the features themselves. Also, for the deriv¬ 
able class writer, using Template Haskell is more cumbersome and 
error-prone than writing a datatype-generic definition in Haskell it¬ 
self. 

For these reasons we think that our library-based approach, 
while having some limitations, has a good balance of expressive 
power, type safety, and the amount of implementation work re¬ 
quired. 

5.2 Generic programming libraries 

Another design choice we made was in the specific library approach 
to use. We have decided not to use any of the existing libraries but 
instead to develop yet another one. However, our library is merely a 
variant of existing libraries, from which it borrows many ideas. We 
see our representation as a mixture between regular (Van Noort 
et al. 200§| and instant-generics ( |Chakravaity et al.|2009| . We 
share the functorial view with regular; however, we abstract from 
a single type parameter, and not from the recursive occurrence. Our 
library can also be seen as instant-generics extended with a 
single type parameter. However, having one parameter allows us 
to deal with composition effectively, and we do not duplicate the 
representation for types without parameters. 

Since we wanted to avoid using GADTs, and we wanted an 
extensible approach, we had to exclude most of the other generic 
progra mming libraries. The only possible choice would have been 
EMGM { {Oliveira et al.||2007) , which supports type parameters, is 
modular and does not require fancy extensions. However, EMGM 
duplicates the representation for higher arities, and encodes the 
representation of a type at the value level. We prefer encoding the 
representation only at the type level, as this has proven to allow for 
type-indexed datatypes (see | Section 7.2| . 

6. Related work 

The generic programming library we pr esent shares many aspects 
with regular (Van Noort et al. 2008) and instant-generics 
(Chakravarty et al.12009) . Clean ( |Alimarine and Plasmeijer|200T| > 
has also integrated generic programming directly in the language. 
We think our approach is more lightweight: we express our generic 
functions almost entirely in Haskell and require only one small 
syntactic extension. On the other hand, the approach taken in Clean 
allows defining generic functions with polykin ded types (|Hinze| 
|2002) , which means that the function bimap (see |Section 2.1 [ i, for 
instance, can be defined. Not all Clean datatypes are supported: 
quantified types, for example, cannot derive generic functions. Our 
approach does not support all features of Haskell datatypes, but 
most common datatypes and generic functions are supported. 

An extension f or derivable type classes similar to ours has 
been developed by |Hinze and Peyton Jones| ( |2001) in GHC. As 
in Clean, this extension requires special syntax for defining generic 
functions, which makes it harder to implement and maintain. In 
contrast, generic functions written in our approach are portabl e 
across different compilers. Furthermore, |Hinze and Peyton Jones[ s 
approach cannot express functions such as fmap, as their type 
representation does not abstract over type variables. 

Rodriguez Yakushev ct al. (2008) give criteria for comparing 
generic programming libraries. 111686 criteria consider the library’s 
use of types, and its expressiveness and usability. Regarding types, 
our library scores very good: we can represent regular, higher- 
kinded, nested, and mutually recursive datatypes. We can also ex¬ 
press subuniverses: generic functions are only applicable to types 


that derive the corresponding class. We only miss the ability to 
represent nested higher-kinded datatypes, as our representation ab¬ 
stracts only over a parameter of kind *. 

Regarding expressiveness, our library scores good for most cri¬ 
teria: we can abstract over type constructors, give ad-hoc definitions 
for datatypes, our approach is extensible, supports multiple generic 
arguments, represents the constructor names and can express con¬ 
sumers, transformers, and producers. We cannot express gmapQ in 
our approach, but our generic functions are still first-class: we can 
call generic map with generic show as argument, for instance. Ad- 
hoc definitions for constructors would be of the form: 
instance Show Exp where 

shows ( Plus ej e 2 ) = shows ei o showString "+" o shows e 2 
shows x = shows default (-L:: Rep^ xp %) x 

However, in our current implementation, Rep^ p is an internal type 
synonym not exposed to the user. Exposing it to the user wo uld re¬ 
quire a naming c onvention. If UHC supported type families ( |Schri-| 
[jvers et al.|2008) , Repo could be a visible type family, which would 
solve our problem for ad-hoc definitions of constructors. It would 
also remove the need for using asTypeOf in |Section 2.3 1 

Regarding usability, our approach supports separate compila¬ 
tion, is highly portable, has automatic generation of its two rep¬ 
resentations, requires minimal work to instantiate and define a 
generic function, is implemented in a compiler and is easy to use. 
We have not yet benchmarked our library in UHC. In GHC, we 
believe it will be as efficient as instant-generics and regular. 

7. Future work 

Our solution is applicable to a wide range of datatypes and can 
express many generic functions. However, some limitations still 
remain, and many improvements are possible. In this section we 
outline some possible directions for future research. 

7.1 Supported datatypes 

Our examples in |Sectioii~2| show that we can represent many com¬ 
mon forms of datatypes. We believe that we can represent all of the 
Haskell 98 standard datatypes in Representableo, except for con¬ 
strained datatypes. We could easily support constrained datatypes 
by propagating the constraints to the generic instances. 

Regarding Representable / , we can represent many, but not all 
datatypes. Consider a nested datatype for representing balanced 
trees: 

data Perfect p =Node p \ Perfect (p,p) 

We cannot give a representation of kind * —> * for Perfect , since 
for the Perfect constructor we would need something like Perfect o 
Reci ((,) p). However, the type variable p is no longer available, 
because we abstract from it. This limitation is caused by the fact 
that we abstract over a single type parameter. The approach taken 
by |Hesselink| ( |2009) is more general and fits closely with our 
approach, but it is not clear if it is feasible without advanced 
language extensions. 

Note that for this particular case we could use a datatype which 
pairs elements of a single type: 
data Pair p = Pair p p 

The representation for the Perfect constructor could then be Perfect o 
Rec i Pair. 

7.2 Type-indexed datatypes 

Some generic functionality, like the zipper i |Huet| 199~7) and generic 
rewriting (Van Noort et al. |2008| >, require not only type-indexed 
functions but also type-indexed datatypes: types that depend on the 




























structure of other types ( |Hinze et al.|2002) . We plan to investigate 
how type-indexed datatypes can be integrated easily in our generic 
deriving mechanism, while still avoiding advanced language exten- 


7.3 Generic functions 

The representation types we propose limit the kind of generic func¬ 
tions we can define. We can express the Haskell 98 standard deriv¬ 
able classes Eq, Ord, Enum, Bounded, Show, and Read, even lift¬ 
ing some of the restrictions imposed on the Enum and Bounded 
instances. All of these are expressible for Representableo types. 
Using Representable i, we can implement Functor, as the param¬ 
eter of the Functor class is of kind * — > +. The same holds for 
Foldable and Traversable. For Typeable we can express Typeableo 
and Typeablei . 

On the other hand, the Data class has very complex generic 
functions which cannot be expressed with our representation. Func¬ 
tion gfoldl, for instance, requires access to the original datatype 
constructor, something we cannot do with the current representa¬ 
tion. In the future we plan to explore if and how we can change our 
representation to allow us to express more generic functions. 

7.4 Efficiency 

The instances derived in our approach are not specialized for a 
datatype and may therefore incur an unacceptable performance 
penalty. However, our recent research jMagalhaes et al.|2010| > indi¬ 
cates that simple inlining and symbolic evaluation, present in some 
form in every optimizing compiler, suffice in most cases to opti¬ 
mize away all overhead from generic representations. We plan to 
investigate how these optimizations can be expressed and automat¬ 
ically applied without any user intervention in UHC. 

7.5 Implementation in GHC 

Our approach is designed to be as portable as possible. Therefore, 
we would like to implement it in other compilers, most impor¬ 
tantly in GHC. As a first step, we believe we can easily implement 
most of our generic deriving mechanism in GHC using Template 
Haskell. The code for the generic functions is kept intact: only the 
DERIVABLE pragma needs a different syntax. For the user code, a 
code splice would trigger the generation of generic representations 
and function instances. 

8. Conclusion 

We have shown how datatype-generic programming can be better 
integrated in Haskell by revisiting the deriving mechanism. All 
Haskell 98 derivable type classes can be expressed as generic func¬ 
tions in our library, with the advantage of becoming easily read¬ 
able and portable. Additionally, many other type classes, such as 
Functor and Typeable, can be declared derivable. Our extension re¬ 
quires little extra syntax, so it is easy to implement. Adding new 
generic derivings can be done by generic programmers in regular 
Haskell; previously, this would be the compiler developer’s task, 
and would be done using code generation, which is more error- 
prone and verbose. 

We have implemented our solution in UHC and invite everyone 
to derive instances for their favorite datatypes or even write their 
own derivings. We hope our work paves the future for a redefinition 
of the behavior of derived instances for Haskell Prime (|Wallace| 
let al.|2007) . 
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